Handling Missing Values via a Neural Selective Input Model
نویسندگان
چکیده
Missing data represent an ubiquitous problem with numerous and diverse causes. Handling Missing Values (MVs) properly is a crucial issue, in particular in Machine Learning (ML) and pattern recognition. To date, the only option available for standard Neural Networks (NNs) to handle this problem has been to rely on pre-processing techniques such as imputation for estimating the missing data values, which limited considerably the scope of their application. To circumvent this limitation we propose a Neural Selective Input Model (NSIM) that accommodates different transparent and bound models, while providing support for NNs to handle MVs directly. By embedding the mechanisms to support MVs we can obtain better models that reflect the uncertainty caused by unknown values. Experiments on several UCI datasets with both different distributions and proportion of MVs show that the NSIM approach is very robust and yields good to excellent results. Furthermore, the NSIM performs better than the state-of-theart imputation techniques either with higher prevalence of MVs in a large number of features or with a significant proportion of MVs, while delivering competitive performance in the remaining cases. We demonstrate the usefulness and validity of the NSIM, making this a first-class method for dealing with this problem.
منابع مشابه
Missing Values in a Backpropogation Neural Net
An empirical study of methods of handling missing values in a backpropagation neural network is presented. Neural networks can be applied to many real world systems to perform classification, pattern recognition or prediction on the basis of input data. However, many such applications cannot guarantee that the data provided to the network will be complete. The backpropagation network does not l...
متن کاملRecurrent Neural Networks for Missing orAsynchronous
In this paper we propose recurrent neural networks with feedback into the input units for handling two types of data analysis problems. On the one hand, this scheme can be used for static data when some of the input variables are missing. On the other hand, it can also be used for sequential data, when some of the input variables are missing or are available at diierent frequencies. Unlike in t...
متن کاملInvestigating the missing data effect on credit scoring rule based models: The case of an Iranian bank
Credit risk management is a process in which banks estimate probability of default (PD) for each loan applicant. Data sets of previous loan applicants are built by gathering their data, and these internal data sets are usually completed using external credit bureau’s data and finally used for estimating PD in banks. There is also a continuous interest for bank to use rule based classifiers to b...
متن کاملRecurrent Neural Networks for Missing or Asynchronous Data
In this paper we propose recurrent neural networks with feedback into the input units for handling two types of data analysis problems On the one hand this scheme can be used for static data when some of the input variables are missing On the other hand it can also be used for sequential data when some of the input variables are missing or are available at di erent frequencies Unlike in the cas...
متن کاملHandling missing values in support vector machine classifiers
This paper discusses the task of learning a classifier from observed data containing missing values amongst the inputs which are missing completely at random. A non-parametric perspective is adopted by defining a modified risk taking into account the uncertainty of the predicted outputs when missing values are involved. It is shown that this approach generalizes the approach of mean imputation ...
متن کامل